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Interpreting the yield of transit surveys: 
Are tliere groups in the known transiting planets population? 
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ABSTRACT 

Context. Each transiting planet discovered is characterized by 7 measurable quantities, that may or may not be linked together. This 
includes those relative to the planet (mass, radius, orbital period, and equilibrium temperature) and those relative to the star (mass, 
radius, elfective temperature, and metallicity). Correlations between planet mass and period, surface gravity and period, planet radius 
and star temperature have been previously observed among the 3 1 known transiting giant planets. Two classes of planets have been 
previously identified based on their Safronov number. 

Aims. We use the CoRoTlux transit surveys to compare simulated events to the sample of discovered planets and test the statistical 
significance of these correlations. Using a model proved to be able to match the yield of OGLE transit survey, we generate a large 
sample of simulated detections, in which we can statistically test the different trends observed in the small sample of known transiting 
planets. 

Methods. We first generate a stellar field with planetary companions based on radial velocity discoveries, use a planetary evolution 
model assuming a variable fraction of heavy elements to compute the characteristics of transit events, then apply a detection criterion 
that includes both statistical and red noise sources. We compare the yield of our simulated survey with the ensemble of 31 well- 
characterized giant transiting planets, using different statistical tools, including a multivariate logistic analysis to assess whether the 
simulated distribution matches the known transiting planets. 

Results. Our results satisfactory match the distribution of known transiting planets characteristics. Our multivariate analysis shows 
that our simulated sample and observations are consistent to 76%. The mass vs. period correlation for giant planets first observed with 
radial velocity holds with transiting planets. The correlation between surface gravity and period can be explained as the combined 
effect of the mass vs. period lower limit and by the decreasing transit probability and detection efficiency for longer periods and higher 
surface gravity. Our model also naturally explains other trends, like the correlation between planetary radius and stellar effective 
temperature. Finally, we are also able to reproduce the previously observed apparent bimodal distribution of planetary Safronov 
numbers in 10% of our simulated cases, although our model predicts a continuous distribution. This shows that the evidence for the 
existence of two groups of planets with different intrinsic properties is not statistically significant. 

Key words, extrasolar giant planets - planet formation 



• ^ 1. Introduction 



The number of giant transiting exoplanets discovered is increas- 
ing rapidly and amounts to 32 at the date of this writing. The 
ability to measure the masses and radii of these objects provides 
us with a unique possibility to determine their composition and 
to test planet formation models. Although uncertainties on stel- 
lar and planetary characteristics do not allow determining the 
precise composition of planets individually, a lot is to be learned 
from a global, statistical approach. 

A particularl y int riguing observations made by 
[Hansen & BarmanI (l2007h from an examination of a set of 
1 8 transiting planets known at that time is the apparent grouping 
of objects in two categories based on their Safronov number. 

The Safronov number is defined as; 
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orb 



a Mp 



(1) 



where Vesc is the escape velocity from the surface of the planet 
and Vorb is the orbital velocity of the planet around its host star. 
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a is the semi-major axis, Mp and are the respective mass of 
the planet and its host star, and Rp is the radius of the planet. It 
is indicative of the efficiency with which a planet scatters other 
bodies, and could play an important role in understanding pro- 
cesses that affected planet formation. 

If real, this division into two groups would probably imply 
the existence of different formation or accretion mechanisms, or 
alternatively require revised evolution models. 

Other puzzling observations includ e the possible trend s be- 
tween planet mass and orbital period ('Mazeh et al.' '2005*) and 
between gravity and orbital period (Southworth et al. 2007, first 
mentioned by R. Noyes in 2006). 

In a previous article jpressin et al.ll2007l -hereafter Paper I- 
), we presented CoRoTlux, a tool to model statistically a pop- 
ulation of stars and planets and compare it to the ensemble of 
detected transiting planets. We showed the results to be in very 
good agreement with the 14 planets known at that time. 

In the present article, we examine whether these trends and 
groups can be explained in the framework of our model or 
whether they imply the existence of more complex physical 
mechanisms for the formation or evolution of planets that are not 
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included in present models. We first describe our model and an 
updated global statistical analysis of the results including 17 new 
planets discovered thus far (§ |2]i. We then examine the trends 
between mass, gravity and orbital period (§[3]l, the grouping in 
terms of planetary radius and stellar effective temperature (§|4|, 
and finally the grouping in terms of Safronov number (§|5]l. 



2. Method and result update 

2.1. Principle of the simulations 

As described in more detail in Paper I, the generation of a pop- 
ulation of transiting planets with CoRoTlux involves the follow- 
ing steps: 

1 . We generate a pop ulation of stars from the Besan§on catalog 
jRobin et al.ll2003h : 

2. Stellar companions (doubles, triples) are added using fre- 
quencies of occurren ce and period distribution s based on 
Duquennoy & Mayor (iDuquennov & Mavorlll99 1): 

3. Planetary companions with random orbital inclinations are 
generated with a frequency of occurrence that depend s on the 
host st ar metallicity with the relation derived by SantoFetal] 
(|2004|) . The parameters of the planets (period, mass, ec- 
centricity) are derived by cloning the known radial-velocity 
(hereafter RV) list of planets (we use J. Schneider's planet 
encyclopaedia: www.exoplanets.eu). We consider only plan- 
ets above 0.3 times the mass of Jupiter, which yields a list of 
229 obje cts. This mass cut-off is c hosen from radial velocity 
analysis ( Fischer & Valentil(l2005b ). as their planetary occur- 
rence law is considered unbiased down to this limit. Because 
of a strong bias of transit surveys towards extremely short 
orbital periods P (less than 2 days), we add to the list clones 
which are drawn from the short-orbit planets found from 
transiting surveys. The probabilities are adjusted so that on 
average ~ 3 transit-planet clones with P < 2 days are added 
to the RV list of 229 giant planets. This number is obtained 
by maximum likelihood on the basis of the OGLE survey to 
reproduce both the planet populations at very-short periods 
that are not constrained by RV measurements and the ones 
with longer periods that are discovered by both types of sur- 
veys (see Paper I). 

4. We compute planetary radii using a structure and evolution 
model that is adjusted to fit the radii distribution of known 
transiting planets: the planetary core mass is assumed to be 
a function of the stellar metallicity, and the evolution is cal- 
culated by including an extra-he at source term equal to 1% 
of the incoming stellar heat flux dGuillot et al.ll2006l: iGuillotI 
[2008)0; 

5. We determine which transiting planets are detectable, given 
an observational duty cy cle and a level o f white and red noise 
estimated a posteriori ( Pont et al.ll2006l) . We also use a cut- 
off in stellar effective temperature Teff^cut above which we 
consider that it will be too difficult for RV techniques to con- 
firm an event. We choose Teff cut = 7200 K as a fiducial value. 
(This value is an estimate of the limit for T^ff used by the 
OGLE follow-up group (F. Pont, pers. communication); in 
practice it has little consequences on the results). 

In order to analyze the complete yield of transit discov- 
eries properly, we shou l d sim ulate each succe ssful survey 
(OGLE: e.g. see lUdalskil ( l2003h ; HATnet: e.g. see lBakos etalj 



' An electronic version of the table of simulated planets used to ex- 
trapolate radii is available at www.obs-nice.fr/guillot/pegasids/ 



2006h: TrES: e.g. see jAjonso et all (|2004|); SWASP: e.g. see 
ColHer C ameron et all (l2006 l); XO: e.g. see iMcCuUough et al.1 
(2006)) one by one. However, we take advantage of the fact 
that these different ground-based surveys have similar obser- 
vation biases and similar noise levels (e.g. the red noise level 
for SWASP (ISmithetalJl2006h is close to the one of OGLE 
(iPont et al.11200 6'). although their instruments and target magni- 
tude range are different). As a consequence, one can notice that 
in terms of transit depth and period distribution of detected tran- 
siting planets, these surveys achieve very similar performances. 
Therefore, as in Paper I, we base our model parameter s (stellar 
fields, duty cycle, red noise l evel) on OGLE parameters jUdalskil 
i2003^.Bouchy et aL.2004: .Pont et alJl2005h . 



2.2. Tine known transiting giant planets 

Our results will be systematically compared to the sample of 3 1 
transiting giant planets that are known at the date of this writing. 
These include in particular: 

- 22 planets for which the refined parameters based on the 
uniform analysis of transit light curves and the observable 
properties of the ho st stars have been generically updated by 
iTorres et"an (l2008h . We exclude the sub-giant Hot Neptune 
GJ-436 b that does not fit our mass criterion and is unde- 
tectable by current ground-based generic surveys^ 

- 9 planets recently discovered and not included in Torr es et aP 
(2008). The characteristics of these planets have not been re- 
fined and are to be considered with more caution. Among 
these planets, we added the first two discoveries of the 
CoRoT satellite. Although CoRoT has significantly higher 
photometric precision and is better suited for finding longer 
period planets than ground based su rveys, we included 
bothCoRoT-Exo-l b (iBarge et alj|2008l) and CoRoT-Exo-2b 
dAlonso et al.l2008l) in our analysis, as they are the two deep- 
est planets candidates of the initial run of the satellite and 
have similar periods and transit depths to planets discovered 
from ground-based surveys. 

The characteristics of the transiting planets are shown in Table[T] 
for transiting planets characteristics and Table |2] for their host 
stars. These tables are used for testing our model. Where the 
stellar metallicity is unknown, we arbitrarily used solar metal- 
licity (see below and the appendix for a discussion). 

2.3. A new metallicity distribution for stars hosting planets 

In Paper I, we had concluded that the metallicity distribution of 
stars with Pegasids (planets with masses between 0.3 and 15/Wjup 
and periods P < 10 days) was significantly different from those 
of stars with planets having longer orbital periods. This was 
based on three facts: 

- The list of radial-velocity planets known showed a lack of 
giant planets with short orbital periods around metal-poor 
stars. Among 25 Pegasids, none were orbiting stars with 
[Fe/H] < -0.07, contrary to planets on longer orbits found 
also around metal-poor stars. 

- The list of transiting planets also showed a lack of plan- 
ets around metal-poor stars, with stellar metallicities ranging 
from -0.03 to 0.37 ([-0.08, 0.44] with eiTor bai's). 

- The population of transiting planets generated with 
CoRoTlux was found to systematically underpredict stellar 
metallicities compared to the sample of observed transiting 
planet. The period vs. metallicity diagram thus formed was 
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Name Mp Rp P T i a reference 







['Wjiip] 


[«]i,p] 


[days] 


[JD - 2450000] 


[°] 


[AU] 




HD171S6b 




3.I3±0.2I 


1.21 ±0.12 


21.21691 ±.71 


4374.8338 ± ^0 


86.5!+; 


0.15 


[Bai-biei-i07]Fischeifl7/li-win08 


HD147506b** 




8.04 ± 0.40 


0.98 ± 0.04 


5.63341 ±.13 


4212.8561 ±.6 


> 86.8 


0.0685 


[Bakos07]Winn07* 


HD149026b 




0.36 ± 0.03 


0.71 ±0.05 


2.8758882 ±.61 


4272.7301 ±.13 


90 ±3.1 


0.0432 


[Sato05]Winn07* 


HD189733b 




1.15 ±0.04 


1.154 ±0.017 


2.218581 ±.2 


3931.12048 ±.2 


85.68 ± 0.04 


0.031 


[Bouchy05]Pont07* 


HD209458b 




0.657 ± 0.006 


1.320 ±0.025 


3.52474859 ± .38 


2826.628521 ±.87 


86.929 ±0.010 


0.047 


[Charbonneau00]Winn05/Knutson06* 


TrES - 1 




0.76 ± 0.05 


1.081 ±0.029 


3.0300737 ± .26 


3186.80603 ±J8 


> 88.4 


0.0393 


[Alonso04]Sozzetti04/Winn07* 


TrES 2 




1.198 ±0.053 


1 220+"^ 


2.47063 ± .1 


3957.6358 ± .10 


83.90 ± 0.22 


0.0367 


[ODonovan06] Sozzetti07* 


TrES 3 




1.92 ±0.23 


1.295 ±0.081 


1.30619 ±.1 


4185.9101 ±.3 


82.15 ±0.21 


0.0226 


[ODonovan07]* 


TrES - 4 




0.84 ± 0.20 


1.674 ±0.094 


3.553945 ±.75 


4230.9053 ± .5 


82.81 ±0.33 


0.0488 


[Mandu.shev07]* 


XO lb 




0.90 ± 0.07 


1.184!S 


3. 941534 ±.27 


3887.74679 ±.15 


89.36!* 


0.0488 


[McCullough06]Holman06* 


XO-2b 




0.57 ± 0.06 


0.973!;^, 


2.615838 ±.8 


4147.74902 ±^0 


> 88.35 


0.037 


[Burke07]* 


XO-3b 




13.25 ±0.64 


1.1-2.1 


3.19154 ± 14 


4025.3967 ± _?8 


79.32 ± 1.36 


0.0476 


[Johns-Krull08] 


HAT P - lb 




0.53 ± 0.04 


1.203 ±0.051 


4.46529 ± .9 


3997.79258 ± J.4 


86.22 ± 0.24 


0.0551 


[Bakos07]Wmn07* 


HAT P - 3b 




0.599 ± 0.026 


0.890 ± 0.046 


2.899703 ± 54 


4218.7594 ±^9 


87.24 ± 0.69 


0.0389 


[ToiTes07]« 


HAT P 4b 




0.68 ± 0.04 


1.27 ±0.05 


3.056536 ± .57 


4245. 8154 ±.3 


89.9!°+ 


0.0446 


[Kovacs07]* 


HAT P 5b 




1.06 ±0.1 1 


1.26 ±0.05 


2.788491 ±.25 


4241.77663 ±J2 


86.75 ± 0.44 


0.0407 


[Bakos07]* 


HAT P 6b 




I.057±0.I19 


1.330 ±0.061 


3.852985 ± .5 


4035.67575 ±J8 


85.51 ±0.35 


0.0523 


[Noyes07]* 


WASP lb 




0.867 ± 0.073 


1.443 ±0.039 


2.519961 ±.18 


4013.31269 ±.47 


> 86.1 


0.0382 


[Camei'on06]Shporer06/Charbonneau06* 


WASP - 2b 




0.81-0.95 


1.038 ±0.050 


2. 152226 ±.4 


4008.73205 ± 28 


84.74 ± 0.39 


0.0307 


[Cameron06]Charbonneau06* 


WASP - 3b 




I.76±0.1I 


1.31 + "' 
1 .45+"''*^ 


1.846834 ±.2 


4143.8503 ±.4 


84 4+^+ 
-Q-5 


0.0317 


[Pollacco07] 


WASP - 4b 




1.27 ±0.09 


1.338228 ±.3 


4365.91475 ±J.5 


87 54+^-^ 


0.023 


[WilsonOS] 


WASP - 5b 




1.58±0.11 


1 090+ 

-.058 


1.6284296 ±.42 


4375.62466 ± J6 


> 85.0 


0.0268 


[Ander.son08] 


COROT - Exo 


-lb 


1.03 ±0.12 


1.49 ±0.08 


1.5089557 ±.64 


4159.4532 ±.1 


85.1 ±0.5 


0.025 


[Bai-geOS] 


COROT Exo 


-2b 


3.31 ±0.16 


1.465 ±0.029 


1.7429964 ±.17 


4237.53562 ±.14 


87.84 ±0.10 


0.028 


[Alonso08] 


OGLE TR 


10b 


0.61 ±0.13 


1 122+°'^ 


3.101278 ±.4 


3890.678 ± .1 


87.2 - 90 


0.0416 


[Konacki05]Pont07/Holman07* 


OGLE TR 


56b 


1.29 ±0.12 


1.30 ±0.05 


1.211909 ±.1 


3936.598 ± .1 


81.0 ±2.2 


0.0225 


[Konacki03]Torres04/Pont07* 


OGLE TR 


111b 


0.52 ±0.13 


1.01 ±0.04 


4.0144479 ±.41 


3799.7516 ±.2 


88.1 ±0.5 


0.0467 


[Pont04]Santo,s06/Winn06/Minniti07* 


OGLE TR 


113b 


1.32 ±0.19 


1.09 ±0.03 


1.4324757 ±.13 


3464.61665 ±.10 


88.8 - 90 


0.0229 


[Bouchy04]Bouchy04/Gillon06* 


OGLE TR 


132 


1.14±0.12 


1.18 ±0.07 


1.689868 ±.3 


3142.5912 ±.3 


81.5 ± 1.6 


0.0299 


[Bouchy04]Gillon07* 


OGLE TR 


182b 


1.01 ±0.15 


1 13+-'' 


3.97910 ±.1 


4270.572 ± .2 


85.7 ± 0.3 


0.051 


[Pont08] 


OGLE TR 


211b 


1.03 ±0.20 


I.36+I 


3.67724 ± .3 


3428.334 ± .3 


> 82.7 


0.051 


[Udalski07] 



Under scores indicate uncertainties on last printed digits. Bracket = announcement paper. No bracket — reterence trom which most parameters have been chosen trom. '^-also in llbrreFet^T 
12003) . 

Mjup - 1.8986112 X 10'° g is the mass of Jupiter. RjLip = 71, 492 km is Jupiter's equatorial radius. 



References: Charbonneau et al. : 200U); Konacki et al. ( 2003); Bouchy et al. ( 2004): Pont et al. ( 200?);'Ton-es et al." ('200?):'Alonso et al." ('20()?):'Sozzetti et al.' :'2()07): Sato et al." ("lOO?); 
iBouchy et al. (2005): Winn et al. (2005); O'Donovan et al. (2006); Collier Cameron et al. (2006); Knutson et al. (2007); Gillon et al. (2006); Charbonneau et al. (2006); Holman et al. 
W06); Shporeretal. (2007); Winn et al. (2007b c); Bakosetal. (2007); Burke et al. ( 2007); lO'Donovan et alj t2007n : IMandushev et alj )200Tl) ; ITorres et ali (200711 ; iPont et alj ilWSi -. 
iGiU on et al. ( 2007); IVlinniti et al. ( 2007); Winn et al. ( 2007a); Kovacs et al. ( 2007) 
The table is derived from Frederic Pout's web site: http : //www ■ inscience ■ ch/transits/| 
** HD 147056 is also called HAT-P-2 



found to be 2.9cr away from the maximum likelihood of sim- 
ulated planets position in the diagram (see Paper I)Q 
On the other hand, a similar calculation done by splitting 
the RV list in a low-metallicity part ([Fe/H] < -0.07) and a 
high-metallicity part (with two different period distribution 
for simulated planets as a function of their host star metal- 
licity) would end in a period vs. metallicity diagram in good 
agreement with the observations (0.4cr from the maximum 
likelihood). 

On the basis of an additional 51 RV giant planets and 17 
transiting planets discovered since Paper I, we must now reex- 
amine this conclusion. Indeed, the average metallicity of stars 
harboring transiting planets has evolved. The OGLE survey was 
characterized by a surprisingly high value ([Fe/H] = 0.24). The 
planets discovered since have significantly lower metallicities 
(an average of [Fe/H] = 0.07). Finally, TrES-2, TrES-3, XO- 



^ Paper I shows how we estimate the deviation of real planets from 
maximimum likehhood of the model: in each 2-parameter space, we 
bin our data on a 20x20 grid as a compromise between resolution of the 
models and characteristic variations of the parameters. The probability 
of an event in each bin is considered equal to the normalized number 
of draws in that bin in our large model sample. The likelihood of a 
31 -planets draw is the sum of the logarithms of the individual probabli- 
ties of its events. We estimate the standard deviation of 1000 random 
31-events draws among the model detections sample, and calculate the 
deviation to maximum likelihood of the known planets as a function of 
this standard deviation. 



3, HAT-P-6 and CoRoT-Exo-1 all appear to have metallicities 
lower than -0.07. 

In Paper I, the metallicity distribution of simulated stars 
was based on that extracted from the photometric observa- 
tion of solar neighborh ood of the Geneva-Copenhagen survey 
dNordstrom et al.l [20041) . This metallicity distribution is in fact 
centred one dex lower (-0.14 instead of -0.04) than the one 
observed using specJ;rometry by RV surveys (Fischer & Valent^ 
120051: ISantoset'all2004 ). Since the latter two works are used to 
derive the frequency of stars bearing planets, we now choose to 
also use these for the metallicity distribution of stars in our fields. 
More specifically, our metallicity distribution law and the planet 
occurrence ra te are obtained b y combining the Santos et aT| 
(I2004h and the lFischer & Valentil (2005) surveys. Figure[T]shows 
the metallicity distribution and planet occurrence that result di- 
rectly from these hypotheses. 

As a consequence, we find that with this improved distri- 
bution of stellar metallicities with the new sample of observed 
planets alleviates the need for advocating a distinction in metal- 
licities between stars harboring short-period giant planets and 
stars that harbor planets on longer periods. Quantitatively, our 
new metallicity vs. period diagram is at 1.09cr of the maximum 
likelihood. We therefore conclude that, contrary to Paper I, there 
is no statistically significant bias between the planet periodicity 
and the stellar metallicity in the observed exoplanet sample. 
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Table 2. Characteristics of stars hosting the transiting planets included in this study 



Name 




Vmag 


M, 

[Mo] 


[Ro] 


[K] 


[FelH] 


Reference 


HD17156 




8.2 


1.2±0.1 


1.47 ±0.08 


6079 ± 56 


0.24 ± 0.03 


Fischer07/Irwin08 


HD147506 




8.7 


1.32 ±0.08 


1.48 ±0.05 


6290 ± 110 


0.12 ±0.08 


Bakos07/Winn07* 


HD149026 




8.2 


1.3 ±0.06 


1.45 ±0.1 


6147 ±50 


0.36 ± 0.05 


Sato05/Winn07* 


HD189733 




1.1 


0.82 ± 0.03 


0.755 ±0.011 


5050 ± 50 


-0.03 ± 0.04 


Bouchy05/Pont07* 


HD209458 




1.1 


1.101 ±0.064 


1.125 ±0.022 


61 17 ±26 


0.02 ± 0.03 


Sozzetti04/Knutson06* 


TrES - 1 




11.8 


0.89 ± 0.035 


0.811 ±0.020 


5250 ± 75 


-0.02 ± 0.06 


Sozzetti04,06/Winn07* 


TrES - 2 




11.4 


0.98 ± 0.062 


1.000+-g^^ 


5850 ± 50 


-0.15 ±0.10 


Sozzetti07* 


TrES - 3 




12.4 


0.90 ±0.15 


0.802 ± 0.046 


5720 ± 150 




Donovan07* 


TrES 4 




11.6 


1.22 ±0.17 


1.738 ±0.092 


6100 ± 150 




Mandushev07* 


XO 1 




11.5 


1.0 ±0.03 


0.928 ±0.015 


5750 ± 13 


0.015 ±0.03 


MCCullough06/Holman06* 


XO 2 




11.2 


0.98 ± 0.02 


964+ 


5340 ± 32 


0.45 ± 0.02 


Burke07* 


XO 3 




9.8 


1.41 ±0.08 


1.377 ±0.083 


6429 ± 50 


-0.18 ±0.03 


Johns-Krull07 


HAT P I 




10.4 


1.12 ±0.09 


1.1 15 ±0.043 


5975 ± 45 


0.13 ±0.02 


Bakos07/Winn07* 


HAT P 3 




11.9 


936+"-"' 


0.824+-|;« 


5185 ±46 


0.27 ± 0.04 


Ton-es07* 


HAT P 4 




11.2 


1 26+-°' 


1.59 ±0.07 


5860 ± 80 


0.24 ± 0.08 


Kovacs07* 


HAT P S 




12.0 


1.160 ±0.062 


1.167 ±0.049 


5960 ± 100 


0.24 ±0.15 


Bakos07* 


HAT P 6 




10.4 


1.29 ±0.06 


1.46 ±0.06 


6570 ± 80 


-0.13 ±0.08 


Noyes07* 


WASP 1 




11.8 


1 15+-^* 


1.453 ±0.032 


6110 ±45 


0.23 ± 0.08 


Cameron06 Charbonneau06/Stempels07* 


WASP 2 




12.0 


0.79!-J 


0.813 ±0.032 


5200 ± 200 




Cameron06/Charbonneau06* 


WASP 3 




10.5 


1.24+-°° 
0.90 ± 0.08 


1 31+-°= 

' -12 
1.026_|,_,j 


6400 ± 100 


0.0 ± 0.2 


Pollacco07 


WASP 4 




12.5 


5500 ± 150 


0.0 ± 0.2 


Wilson08 


WA&r — 3 




12.3 


n 0"7 _i_ n no 
u.y / ± u.uy 


J /UU ± 1 jU 


0.0 ± 0.2 


Andei-son08 


COROT Exc 


1 


13.6 


0.95 ±0.15 


1.11 ±0.05 


5950 ± 150 


-0.3 ± 0.25 
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Underscores indicate uncertainties on last printed digits. ^'=also in Torres et al. i 2008) 

References: Charbonneau et al. i 200U); Konacki et al. ( 2003); Bouchv et al. ( 2004); Pont et al. ( 2004); Toi-res et al. ( 2004); Alonso et al. 1 2004); Sozzetti et al. : 2004 ); Sato et al. r 20Q3); 
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IGillon et al. ( 2007); Minniti et al. ( 2007); Winn et al. ( 2007a) ; .Kovacs et al.. ( .2007.) ; Bouchv et al. ( .2008.) ; The discovery Papers are in brackets. The table is taken from F. Pont's site: 
Ihttp : //www ■ inscience . ch/transits/| 
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Fig, 1. Distribution of stars as a function of their metallic- 
ity [Fe/H]. Upper panel: Fraction of stars with planets as a 
function of their metallicity, as obtained from radial velocity 
surveys (ISantos et alj 120041: iFischer & Valentil l2005h . Bottom 
panel: Normalized distribution of stellar metallicities assumed 
in Paper I (blue) and in this work (black). The resulting [Fe/H] 
distribution of planet-hosting stars is also shown in red. 



2.4. Statistical evaluation of the performances of the model 

As shown in detail in the appendix (see online version), the 
model is evaluated using univariate, two-dimensional and multi- 
variate statistical tests. Specifically, we show that the parameters 
for the simulated and observed planets have globally the same 



mean and standard deviation and that both Student-t tests and 
Kolmogorov-Smirnov tests indicate that the two populations are 
statistically indistinguishable. However, while these univariate 
tests provide preliminary tests of the quality of the data, they 
are not sufficient because of the multiple correlations between 
parameters of the problem. 

Table[3]presents the Pearson correlation coefficients between 
each variable. It shows that the problem indeed possesses multi- 
ple, complex correlations. In this table, the variable Y character- 
izes the 'reality' of the planet considered (it is equal to 1 if the 
planet of the list is an observed one, and to if it is a simulated 
planet). We see that Y is very weakly correlated with parameters 
of the problem. This indicates that the model is well-behaved, 
but does not constitute a complete validity test in itself. 

Table |4]presents the results of a multivariate test using a so- 
called logistic regression (see the appendix for more details). 
This method allows to include simultaneously all planet char- 
acteristics as predictors of the probability of being a known tran- 
siting planet (hereafter named 'real' planets as opposed to simu- 
lated ones), thereby controlling for the correlations between vari- 
ables at once. Based on maximum likelihood estimation method, 
it provides information on whether a given characteritics is pos- 
itively (resp. negatively) and significantly (resp. non signifi- 
cantly) related with the fact of being a real planet. Moreover, 
it computes the probability T^i as a general assessment of the 
quality of the fit. In our case, a large T^i implies no significant 
difference between the simulated and real planets. Globally, the 
general fit of the model shows that simulated planets are not sig- 
nificantly distinct from real planets CP^i - 0.765). This can be 
compared to a model in which model radii are artificially in- 
creased by 10%, for which 'P^i ~ lO""* (see appendix) 
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Table 3. Pearson correlations between planetary and stellar characteristics. Significant correlations (> 0.5) are boldfaced. 
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-0.0324 
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★ : variable Y has value 1 if the planet is observed, if it is simulated. 



Table 4. Logistic maximum likelihood estimates: is indicative 
of a correlation with Y; "t-stat" is the the distance in standard 
deviations from no correlation, and P represents the probability 
that the model and observations are not significantly different. 
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Maximum likelihood estimations 
Probability ^^2= 0.756 



Table |4] also presents for each seven independent variables 
of the problem plus the planet equilibrium temperature T^q and 
Safronov number 6 how a given variable is correlated with the 
fact that a planet is "real" (as opposed to being one of the simu- 
lated planets in the list). The different statistical parameters pre- 
sented in this table are defined in the appendix. We only provide 
here a short description: yS is indicative of a correlation between a 
given variable and the Y (reality) variable, "t-stat" represent the 
distance from the mean in terms of standard deviations (student-t 
test). V represents the probability that the correlation is signifi- 
cant. The two last parameters are evaluated using bootstrap. 

The fact that the parameters fi in table |4] are non-zero indi- 
cate that there is a correlation between each parameter and the 
variable Y. However, the t-student test indicates that in every 
case but one (for [Fe/H]), the values obtained for p are consis- 
tent with to within one standard-deviation: the agreement be- 
tween model and observations is good. This is further shown by 
the high V values (indicative of consistency between model and 
observations): The lowest V value is associated with the stellar 
metallicty [Fe/H], but it is high enough not to show a statistically 
significant difference between our modeled sample and real ob- 
servations. However, this characteristic is the one with the largest 
error bars, and the only one to have missing data (for TrES-3, 
TrES-4, WASP-2 and CoRoT-Exo-2). We included [Fe/H], as it 
is an important feature of our model, in our multivariate analy- 
sis, but the comparison with real planets for this characteristic 
is to be considered carefully. The quality of the agreement be- 
tween observed planets characteristics and our model improves 
to 88.4% if we remove [Fe/H] from our logistic maximum like- 
lihood estimates (see the appendix for details and further tests). 



2.5. Updated mass-radius diagram 

Throughout the article, we will use density maps of the simu- 
lated detections and compare them to the observations. These 
density maps use a resolution disk template to get smooth plots. 
The size of the resolution template is a function of the num- 
ber of events present in the diagram. The color levels follow a 
linear density rule for most diagrams we show. In the case of 
specific diagrams showing rare long period discoveries (more 
than 5 days) and large surface gravity or Safronov number, we 
choose to use a logarithmic color range for density maps to em- 
phasize these rare events. A probability map is established us- 
ing the model detections sample (50,000 detections obtained by 
simulating multiple times the number of observations from the 
OGLE survey). Again, we stress that we limited our model to 
planets below 0.3Mjup, both because the question of the compo- 
sition becomes more important and complex for small planets, 
and because RV detection biases are also more significant, their 
distribution is only partially known from RV surveys. 

Figure |2] shows the mass-radius diagram density map simu- 
lated with CoRoTlux and compared to the known planets. Gaps 
in the diagram at ~ 3Mji,p and ~ 6 - 7Mjup are due to the 
small sample of close-in RV planets in these ranges and the fact 
that our mass distribution is obtained by cloning these observed 
planet rather than relying on a smooth distribution (see Paper I 
for a discussion). These gaps should disappear with more discov- 
eries of close-in planets by RV. Otherwise, the model distribution 
and the known planets are in fairly good agreement, as indicated 
by the 1.7 ~ 1.8o- distance to the maximum likelihood for this 
diagram (Table|9]). However, the agreement is not as good as one 
would expect probably because of two planets that possess espe- 
cially large radii CoRoT-Exo-2b and TrES-4b. The existence of 
these planets is a problem for evolution models in general that 
goes beyond the present statistical tests that we propose in this 
article. 



3. Trends between mass, surface gravity and orbital 
period 

3.1. A correiation between mass and orbitai period of 
Pegasids 

Figure[3]compares the known radial-velocity planets to the ones 
detected in transit. The figure highlights the fact that transit sur- 
veys are clearly biased towards detecting short-period planets. 
However, as shown in Paper 1 and furthermore reinforced in the 
present study, the two populations are perfectly compatible pro- 
vided a limited proportion of very small planets {P < 2 days) are 
added. 
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Fig. 2. Mass - radius relation for the transiting Pegasids discov- 
ered to date (filled circles for planets with low Safronov number 
< 0.05, open circles for planets with higher 9 values). The joint 
probability density map obtained from our simulation is shown 
as grey contours (or color contours in the electronic version of 
the article). The resolution disk size used for the contour plot 
appears in the bottom left part of the picture. At a given (x,y) 
location the normalized joint probability density is defined as 
the number of detected planets in the resolution disk centered on 
(x,y) divided by the maximum number of detected planets in a 
resolution disk anywhere on the figure. 
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Fig. 3. Mass-period distribution of known short-period exoplan- 
ets. Crosses correspond to non-transiting planets discovered by 
radial-velocity surveys. Open and filled circles correspond to 
transiting planets (with Safronov numbers below and over 0.05, 
respectively) 



iMazeh et al.l ( l2005l) had pointed out the possibility of an in- 
triguing coiTelation between the masses and periods of the six 
first known transiting exoplanets. Figure |3] shows that the trend 
is confirmed with the present sample of planets. This correla- 
tion may be due to a migration rate that is inherently depen- 
dant upon planetary mass or to other formation mechanisms. It 
is not the purpose of the present article to analyze this corre- 
lation. However, because we use clones of the radial-velocity 



Fig. 4. Planetary surface gravity versus orbital period of tran- 
siting giant planets discovered to date (circles) compared to a 
simulated joint probability density map (contours). Symbols and 
density plot are the same as in fig.|2] 



planets in our model, it is important to stress that this absence of 
small-mass planets with very short orbital distances can subtend 
some of the results that will be discussed hereafter. 



3.2. A correlation between surface gravity and orbital period 
of Pegasids ? 

The existence of a possible anti-correlation between planetary 
surface gravity g - GMp/R^ and the orbital period of the 
nine first transiting plan ets has been pointed out for some time 
(ISouthworth et al.ll2007h . This correlation still holds (fig. IH for 
the Pegasids with periods below 5 days and with jovian masses 
discovered to date. At the same time, it is important to stress that 
massive objects (XO-3b, HAT-P-2b and HD 17 156b) are cleai- 
outliers (see fig.|5]i: Their much larger surface gravity probably 
implies that they are in a diff'erent regime. 

Our model agrees well with the observations (in this P - g 
diagram real planets are at 0.51 cr from maximum likelihood of 
the simulated results). We can explain the apparent coiTelation 
in Figure |4] as stemming from the existence of two zones with 
few detectable transiting giant planets: 

1 . The bottom left part of the diagram where planets are rare, 
because of a lack of light planets (with low surface gravity) 
with short periods, as discussed in section [TTt 

2. The upper right part of the diagram (high surface gravity, 
low planetary radius) where transiting planets are less likely 
to transit and more difficult to detect. 

Figure |5] shows the same probability density map as in fig.|4] 
but at a larger scale in period and gravity. The three outliers to the 
"coiTelation" appear. These are the large mass planets XO-3b, 
HAT-P-2b and HD17156b. Given the method chosen to draw the 
planet population with CoRoTlux, the probability density func- 
tion that we derive is small, but non-zero around these objects, 
and also elsewhere in the diagram due to the presence of non- 
transiting giant planets with appropriate characteristics. Seen at 
this larger scale, it is clear that the planetary-gravity vs. period 
relation is much more complex than a simple linear relation. 

Globally, figures |4] and |5] indicate that the relation between 
planet surface gravity and orbital period is not a consequence 
of a link between the planet composition and its orbital period. 
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Fig. 5. Same figure as Figure|4]with extended surface gravity and 
period ranges. Note that the scale of the color levels is logarith- 
mic, in order to emphasize the presence of outliers. 

Table 5. Mean planet radius for cool versus hot stars 





Cool stars 


Hot stars 




Teff < 5400 K 


Teff > 5400 K 


"Real" planets 


1.072i?j„p 


1.267i?j„p 


All simulated planets 


1.058i;j„p 


1.202i?j„p 


Detectable simulated planets 


1.074^j„p 


1.251i?j„p 



Rather, we see it as a consequence of the correlation between 
planetary mass and orbital period for short period giant planets, 
which is, as discussed in the previous section, probably linked to 
mass-dependent migration mechanisms. 



4. A correlation between stellar effective 
temperature and planet radius ? 

The range of radius of Pegasids is surprisingly large, especially 
when one considers the difference in compositions (masses of 
heavy elements varying from almost to ~ 100 M®) that are 
require d to exp lain known transiting planets within the same 
model jGuillot et aLll2006l:lGuillotll2008b . Our underlying planet 
composition/evolution model is based on the assumption of a 
correlation of the stellar metallicity with the heavy element con- 
tent in the planet. We checked that no other variable is responsi- 
ble for a correlation that would affect this conclusion. 

We present the results obtained in the Tefr - Rp diagram as 
they are the most interesting: the two variables indeed are pos- 
itively correlated. Furthermore, given that errors in the stellar 
parameters are the main sources of uncertainty in the planetary 
radii determinations, one could suppose that a systematic eiTor 
in the stellar radius measurement as a function of its effective 
temperature could be the cause of the variation in the estimated 
planetary radii. If true, this may alleviate the need for extreme 
variations in composition. It would cast doubts on the stellar 
metallicity vs. planetary heavy elements content coiTelation. 

As shown in Table |5] the mean radius of planets orbiting 
cool stars (T^ff < 5400/:) is 1.072Rjup and it is 1.267Rjup for 
planets orbiting hot stars (Teff > 5400A'). Slightly smaller values 
are obtained in our simulation when considering all transiting 
planets. However, the values obtained when considering only the 



Fig. 6. Stellar effective temperature versus planetary radius of 
transiting giant planets discovered to date (circles) compared to 
a simulated joint probability density map (contours). The black 
line is the sliding average of radii in the [-250A', +25QK] ef- 
fective temperature interval for all simulated transiting planets 
(both below and over the detection threshold). The white line 
is the same average for the detectable planets in the simulation. 
The symbols and density map are the same as in fig. ID 



detectable transiting planets are in extremely good agreement 
with the observations. 

Figure|6] shows in more detail how stellar effective tempera- 
ture and planetary radius are linked. We interpret the correlation 
between the two as the combined effect of irradiation (visible 
with the plotted average radius of all planets with at least one 
transit event in simulated light curves) and detection bias (visible 
with the plotted average radius of simulated planets detected): 

1. The planets orbiting bright stars are more irradiated. The 
mean radius of a planet orbiting a warmer star is thus higher 
at a given period. This effect is taken into accoun t in ou r 
planetary evolutiori model (see lGuillot & ShowmanI (|2002|) : 
lGuillotet"an (120061) '). 

2. The detection of a planet of a given radius is easier for cooler 
stars since for main sequence stars effective temperature and 
stellar radius are positively correlated. 

We therefore conclude that the effective temperature- 
planetary radius coiTelation is a consequence of the physics of 
the problem rather than the cause of the spread in planetary radii. 
This implies that another explanation - an important variation 
of the planetary composition - is needed to account for the ob- 
served radii. 

As in the mass-radius diagram (fig. |2]|, there is an outlier at 
the bottom of figure |6] HD149026b. As discussed previously, 
this object lies at the boundary of what we could simulate, both 
in terms of masses and amounts of heavy elements, so that we 
do not consider this as significant. It is also presently not de- 
tectable from a transit survey. Clearly, with more sensitive transit 
surveys, the presence of low-mass planets with large fraction of 
heavy elements compared to hydrogen and helium will populate 
the bottom part of this diagram. 

A last secondary outcome of the study of this diagram con- 
cerns the possible existence of two groups of planets roughly 
separated by a T^s - 5400 K line. We find that the existence of 
two such groups separated by ~ 200 K or more appears serendip- 
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itously in our model in 10% of the cases and is therefore abso- 
lutely not statistically significant. 



5. Two classes of Hot Jupiters, based on their 
Safronov numbers? 



According to iHansen & BarrnanI ( l2007h . the 16 planets discov- 
ered at the time of their study show a bimodal distribution in 
Safronov numbers, half of the sample having Safronov numbers 
6 ~ 0.07 ("class I") while the other half is such that 6 - 0.04 
("class 11"). They also point out that the equilibrium tempera- 
tures of the two classes of planets differ, the class II planets be- 
ing on average hotter. This is potentially of great interest because 
the Safronov number is indicative of the efficiency with which a 
planet scatters other bodies and therefore this division in two 
classes, if real, may tell us something about the processes that 
shaped planetary systems. 

5.1. No significant gap between two classes. 

Figure |7] shows how the situation has evolved with the new tran- 
siting giant planets discovered thus far: Although a few planets 
have naiTowed the gap between the two ensemble of planets, it 
is still present and located at a Safronov number ~ 0.05. The 
two classes also have mean equilibrium temperatures that differ. 

On the other hand, our model naturally predicts a continu- 
ous distribution of Safronov numbers. A trend is found in which 
planets with high equilibrium temperatures tend to have lower 
Safronov numbers, which is naturally explained by the fact that 
equilibrium temperature and orbital distance are directly linked 
(remember that - (a/Rp) (Mp/M*)). 

We find that our 6 - Teq joint probability density function 
is representative of the observed population, being at 0.68cr 
from the maximum likelihood (see appendix). A K-S test on the 
Safronov number yields a distance between the observed and 
simulated distributions of 0.163 and a corresponding probabil- 
ity for a good match of 0.38, a value that should be improved in 
future models, but that shows that the two ensembles are statis- 
tically indistinguishable. 

Figure [8] compares the histogram of the distribution of 
Safronov number for simulated detections with the histogram 
of real events. Interestingly, although distributions seem differ- 
ent from the 0.05-scale histogram, with a gap appearing in the 
0.05 - 0.055 slots, they fit each other while using the 0.1 -scale 
histogram, more appropriate for this low-number statistics anal- 
ysis (7 intervals for 31 events). 

Figure |9] shows the probability to obtain a gap of a given 
size between the Safronov numbers of two potential groups of 
a random draw. 26 of the known transiting Pegasids have their 
Safronov number between and 0. 1 . Setting a minimum number 
of 5 planets in each of two classes, we look for the largest gap 
between Safronov numbers of a random draw of 26 simulated 
Pegasids. For each one of the 10000 Monte-Carlo draws among 
the model detections sample, we calculate how large is the most 
important difference between successive Safronov numbers of 
the 26 random draws. We find that a gap of 0.0102 between 
two potential groups is an uncommon event (10 % of the cases, 
as 4 % of the cases have gaps of this size, and a total of 6 % 
of the cases have larger gaps), yet it is not exceptionally rare. 
Considering the 7 planet/star characteristics and their many pos- 
sible combinations, this level of "rarity" is not statistically sig- 
nificant. 
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Fig. 7. Safronov number versus equilibrium temperature of tran- 
siting giant planets discovered to date (circles) compared to a 
simulated joint probability density map (contours). Open (resp. 
filled) circles correspond to class I (resp. class II) planets. The 
symbols and density map are the same as in fig. ID 
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Fig. 8. Comparison of the distribution of Safronov number be- 
tween simulated detections (Red) and real events (Black). Top: 
histogram with 6 0.1 -scale columns. Bottom: histogram with 12 
0.05-scale columns. 



It is also interesting to consider the few high-Safronov- 
number planets discovered as in Figure [TOl The different gaps 
in the diagram are due to our mass vs. period carbon copies of 
RV planets that do not uniformly cover the space of parame- 
ters. The desert part in the right edge of the density map is due 
to the absence of massive planets in the [3, 15]Mji,p range at 
close orbit in the RV planets. The simulated detections at both 
high Safronov number and equilibrium temperature correspond 
to simulated clones of the planet HD4 1004b, with its large mass 
of 18 Mjup and its very close-in period of 1.33 days. 

5.2. No bimodal distribution visible in other diagrams. 

When plotted as a function of different stellar (effective temper- 
ature, mass, radius) and planetary characteristics (mass, radius, 
period, equilibrium temperature), the two potential Safronov 
classes do not differ in a significant way. When plotting our sim- 
ulated detections as a function of their Safronov number in dif- 
ferent diagrams, the two groups formed by cutting our model 
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Fig. 9. Occurrence of the largest observed separation of Safronov 
numbers between two 'groups' selected in random draws among 
the model detections sample. The vertical line shows the separa- 
tion (0.0102) between the two classes of planets as inferred from 
the observational sample. 




Fig. 10. Same as Figure|2]but for a larger range of Safronov num- 
bers. Note that the scale of the color levels is logarithmic, in or- 
der to emphasize the presence of outliers. 



detections sample with a Safronov number cut-off set at 0.05 
partly overlap each other on most diagrams. Here, we choose to 
present the planetary mass vs. equilibrium temperature diagram 
which i ised to provide a clear separation between th e two popu- 
lations dHansen & Barmanl2007tlTorres et"ani2008l) . We present 
in fig [TT| this diagram as an example of partial overlap of the 
class I and class II detected planets and probability density maps. 
Contrary to indications based on a smaller sample of observa- 
tions, there is no more a clear separation in this diagram between 
class I and class II planets. 



5.3. No correlation between metallicity and Safronov 
number/class. 




[Torres et"al] (l2008l) showed that a significant difference could 
be observed between the metallicity distributions of the two 
Safronov classes. The high-Safronov number class (class I, 6 > 
0.05) had its host star metallicity centered on 0.0, and the low- 



Fig. 11. Planetary mass versus equilibrium temperature of tran- 
siting giant planets discovered to date (circles) compared to a 
simulated joint probability density map (contours). Top panel: 
The density map accounts only for simulated planets with a 
Safronov number 6 > 0.05 (class I planets). Bottom panel: The 
density map corresponds only to planets with 9 < 0.05 (class II 
planets). The symbols and density maps are the same as in fig.|2] 



Safronov number class (class II) was centered on 0.2. They 
pointed out that the Safronov numbers for Class I planets show 
a decreasing trend with metallicity. 

The two recent discoveries of CoRoT-Exo-l-b ([Fe/H] = 
-0.4 and 9 = 0.038) and OGLE TR182-b ([Fe/H] = 0.37 and 
9 - 0.08) tend to contradict this argument. Considering the 31 
known giant planets, the mean metallicity of stars hosting class 
I planets is now [Fe/H] = 0.6, and it is 1.6 for class II plan- 
ets. Figure [T2] shows that although the metallicity vs. Safronov 
number distribution of detections we simulate is a likely re- 
sult (0.63cr from maximum likelihood), the potential a nticorrela- 
tion between 9 and host star [Fe/H] (pointed out by Torres et al.l 
(l2008i) ) for class I planets is not present in our simulation, which 
shows a continouous density map. 

5.4. No significant gap between two Safronov number 
classes. 

Our study has shown us that a separation between two groups of 
planets linked to their Safronov number is unlikely for at least 
two reasons: 
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Fig. 12. Safronov number of transiting planets as a function of 
their host star metallicity. The density map with linear contours 
comes from the model detections sample. Open and filled circles 
are respectively class I planets [with Safronov number over 0.05] 
and class II planets [with Safronov number below 0.05] Symbols 
and density plot are the same as in fig.|2l 



1 . The separation between the two groups is marginal. It only 
appears in the Safronov number histogram if the resolution 
of the histogram is high in comparison to the number of 
events sampled. The separation of ~ 0.01 between two possi- 
ble Safronov classes has a non-negligible 10% probability to 
occur serendipitously in our distribution which is otherwise 
continuous. Considering the relatively numerous parameters 
(4 for the star, 3 linked with the planet) and their combina- 
tions, such a division in two groups appears quite likely to 
occur fortuitously for one such parameter. 

2. The separation between the two classes is not present on any 
figures other than the ones involving the Safronov number 
itself. This includes also the separation in metallicity vs. 6 
which is not statistically significant, especially given recent 
discoveries of CoRoT-Exo-lb and OGLE-TR-182b . 

On the other hand, we cannot formally rule out the existence 
of these two groups of planets. We hence eagerly await other 
observations of transiting planets for further tests. 



6. Conclusions 

We have presented a coherent model of a population of stars and 
planets that matches within statistical eiTors the observations of 
transiting planets performed thus far. Thanks to new observa- 
tions, we have improved on our previous model (Paper I). In 
particular, we now show that with slightly improved assump- 
tions about the metallicity of stars in the solar neighborhood, the 
metallicity of stars with transiting giant planets can be explained 
without assuming any bias in period vs. metallicity. 

In order to validate our model, we have used a series of uni- 
variate, bivariate and multivariate statistical tests. As the sample 
of radial-velocimetry planets and of transiting planets grow, we 
envision that with these tools we will be able to much better char- 
acterize the planet population in our Galaxy and its dependence 
with star population, and also test models of planet formation 
and evolution. 

With today's sample of transiting planets, our model pro- 
vides a very good match with the observations, both when con- 



sidering planetary and stellar parameters one by one or glob- 
ally. Our analysis has revealed that the parameters for the mod- 
eled planets are presently statistically indistinguishable from the 
observations, although there may be room for improvement of 
the model. It should be noted that our underlying assumptions 
for the compositions and evolution of planets and for the stellar 
populations are relatively simple. With a larger statistical sam- 
ple, tests of these assumptions will be possible and will bring 
important constraints on the planet-star distribution in our galac- 
tic neighborhood. The CoRoT mission is expected to be very 
important in that respect, especially given the careful determina- 
tion of the characteristics of the stellar population that is being 
monitored. 

Using this method, we have been able to analyze and ex- 
plain the different correlations observed between transiting plan- 
ets characteristics: 

1. Mass vs. period: One of the first correlations observed 
among the planet/star characteris tics was the mass vs. pe- 
riod of close-in RV planets (Mazeh et alj 12005 ). Although 
our model does not explain it, we confirm with a sample that 
is now 4 times larger than at the time of the publication that 
there is a lack of low-mass planets (Mp < IMjup) on very 
short periods (P <2 days). 

2. Surface gravity vs. period: There is an inverse correlation 
between the surface gravity and period of transiting planets. 
We show that this coiTelation is caused by the above mass vs. 
period eff'ect, and by a lower detection probability for planets 
with longer periods and higher surface gravities. 

3. Radius vs. stellar effective temperature: Planets around stars 
with larger eff'ective temperatures tend to have larger sizes. 
This is naturally explained by a combination of slower con- 
traction due to the larger iiTadiation and by the increased dif- 
ficulty in finding pl anets around hotter, l arger stars. 

4. Safronov number: Hansen & Barmaril (l2007l) : iTorres et"an 
(2008) have identified a separation between two classes of 
planets, based on their Safrononov number, and visible in 
different diagrams {6 vs. T^g and vs. [Fe/H], Mj, vs. T^g). 
With recent discoveries, this separation is still present in the 
Safronov number distribution, but not anymore in other di- 
agrams. On the other hand, our simulation predicts distribu- 
tions that are continuous, in particular in terms of Safronov 
number. With this continuous distribution, we show that a 
random draw of 30 simulated planets produces two spurious 
groups separated in Safronov number by a distance equal to 
or larger than the observations in 30% of the cases. The sep- 
aration is not visible and significant between the two classes 
in any other diagram we plotted. Therefore, we conclude that 
the separation in two classes is not statistically significant but 
is to be checked again with a larger sample of observed plan- 
ets. Interestingly, if on the contrary two classes of Safronov 
numbers were found to exist we would have to revise our 
model for the composition of planets. 

In the next few years, precise analyses of surveys with well- 
defined stellar fields and high yields (like CoRoT and Kepler) 
will allow to precisely test different formation theories and to 
link planetary and stellar characteristics. It should also allow pre- 
cising the laws behind the occurrence of planets and their orbital 
and physical parameters. Up to now, we have focused on giant 
planets, but with larger statistical samples, we hope to be able to 
extend these kind of studies to planets of smaller masses which 
will be intrinsically more complex because of a larger variety in 
their compositions (rocks, ices, gases). Altogether, this stresses 
the need for a continuation of radial-velocity and photometric 
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surveys for, and follow-up observations of, new transiting plan- 
ets to greatly increase the sample of known planets and obtain 
accurate stellar and planetary parameters. The goal is of impor- 
tance: to better understand what our galactic neighborhood is 
made of. 

Acknowledgments 

The code used for this work, CoRoTlux, was developed as 
part of the CoRoT science program by the authors with major 
contributions by Aurelien Garnier, Maxime Marmier, Vincent 
Morello, Martin Vannier, and help from Suzanne Aigrain, Claire 
Moutou, Stephane Lagarde, Antoine Llebaria, Didier Queloz, 
and Francois Bouchy. We thank F. Pont for many fruitful 
discussions on the subject, and an anonymous referee for a 
detailed review that helped improve the manuscript. F.F. was 
funded by a grant from the French Agence Nationale pour la 
Recherche. This work used Jean Schneider's exoplanet database 
www. exoplanet . eu, Frederic Font's table of transiting planets 
characteristics http : //www . inscience . ch/tra nsits/| 
and the Besangon model of the Galaxy at 
physique.obs-besancon.fr/modele/ extensively. The 
planetary evolution models used for this work can be down- 
loaded at www. obs-nice . £r/guillot/pegasids/. 



O'Donovan, F. T., Charboimeau, D., Mandushev, G., et al. 2006, ApJ, 651, L61 
Pont, R, Bouchy, R, Melo, C, et al. 2005, Astron. &■ Astrophys, 438, 1 123 
Pont, R, Bouchy, R, Queloz, D., et al. 2004, Astron. ir Astrophys, 426, L15 
Pont, R, Tamuz, O., Udalski, A., et al. 2008, Astron. &r Astrophys, 487, 749 
Pont, R, Zucker, S., & Queloz, D. 2006, MNRAS, 373, 23 1 
Robin, A. C, Reyle, C, Derriere, S., & Picaud, S. 2003, Astron. &r Astrophys, 
409, 523 

Santos, N. C, Israelian, G., & Mayor, M. 200A, Astron. &- Astrophys, 415, 1153 
Sato, B., Rischer, D. A., Henry, G. W., et al. 2005, ApJ, 633, 465 
Shporer, A., Tamuz, O., Zucker, S., & Mazeh, T. 2007, MNRAS, 136 
Smith, A. M. S., Collier Cameron, A., Christian, D. J., et al. 2006, MNRAS, 373, 
1151 

Southworth, J., Wheatley, R J., & Sams, G. 2007, MNRAS, 379, Lll 
Sozzetti, A., Yong, D., Torres, G., et al. 2004, ApJ, 616, L167 
Torres, G., Bakos, G. A., Kovacs, G., et al. 2007, ApJ, 666, L121 
Torres, G., Konacki, M., Sasselov, D. D., & Jha, S. 2004, Ap7, 609, 1071 
ToiTcs, G., Winn, J. N., & Holman, M. J. 2008, ApJ, 611, 1324 
Udalski, A. 2003, Acta Astronomica, 53, 291 

Winn, J. N., Holman, M. J., Bakos, G. A., et al. 2007a, AJ, 134, 1707 
Winn, J. N., Holman, M. J., & Fuentes, C. I. 2007b, AJ, 133, 11 
Winn, J. N., Holman, M. J., Henry, G. W., et al. 2007c, AJ, 133, 1828 
Winn, J. N., Noyes, R. W., Holman, M. J., et al. 2005, ApJ, 631, 1215 



References 

Abramowitz, M. & Stegun, I. A. 1964, Handbook of Mathematical Runctions, 

ninth dover printing, tenth gpo printing edn. (New York: Dover) 
Aldrich, J. & Nelson, F. 1984, Linear ProbabiHty, Logit, and Probit Models (Sage 

Series on Quantitative Analysis) 
Alonso, R., Auvergne, M., Baglin, A., et al. 2008, A.vrron. Astrophys, 482, L21 
Alonso, R., Brown, T. M., Torres, G., et al. 2004, ApJ, 613, L153 
Bakos, G., Noyes, R. W., Latham, D. W., et al. 2006, in Tenth Anniversary 

of 51 Peg-b: Status of and prospects for hot Jupiter studies, ed. L. Arnold, 

R. Bouchy, & C. Moutou, 184-186 
Bakos, G. A., Shporer, A., Pal, A., et al. 2007, ApJ, 671, L173 
Barge, P., Baglin, A., Auvergne, M., et al. 2008, Astron. ir Astrophys, 482, L17 
Bouchy, R, Pont, R, Melo, C, et al. 2005, Astron. tb- Astrophys, 431, 1 105 
Bouchy, R, Pont, P., Santos, N. C, et al. 2004, Astron. Astrophys, 421, L13 
Bouchy, F, Queloz, D., Deleuil, M., et al. 200'i, Astron. Astrophys, 482, L25 
Burke, C. J., McCullough, R R., Valenti, J. A., et al. 2007, ApJ, 671, 21 15 
Charbonneau, D., Brown, T. M., Latham, D. W., & Mayor, M. 2000, ApJ, 529, 

L45 

Charbonneau, D., Winn, J. N., Latham, D. W., et al. 2006, ApJ, 636, 445 
Collier Cameron, A., Bouchy, R, Hebrard, G., et al. 2006, ArXiv Astrophysics 
e-prints 

Duquennoy, A. & Mayor, M. 1991, Astron. & Astrophys, 248, 485 
Rischer, D. A. & Valenti, J. 2005, ApJ, 622, 1 102 

Rressin, R, Guillot, T, Morello, V., & Pont, R 2007, Astron. &r Astrophys, 475, 
729 

Gillon, M., Pont, F, Moutou, C, et al. 2006, Astron. b- Astrophys, 459, 249 
Gillon, M., Pont, F, Moutou, C, et al. 2007, Astron. &r Astrophys, 466, 743 
Greene, W. H. 2000, Econometric Analysis, fourth edition edn. (Prentice Hall 

International, International Edition) 
Guillot, T. 2008, Physica Scripta Volume T, 130, 014023 
Guillot, T., Santos, N. C, Pont, R, et al. 2006, Astron. &■ Astrophys, 453, L21 
Guillot, T. & Showman, A. P 2002, Astron. b- Astrophys, 385, 156 
Hansen, B. M. S. & Barman, T. 2007, ApJ, 671, 861 
Holman, M. J., Winn, J. N., Latham, D. W., et al. 2006, ApJ, 652, 1715 
Knutson, H. A., Charbonneau, D., Noyes, R. W., Brown, T. M., & Gilhland, R. L. 

2007, ApJ, 655, 564 
Konacki, M., Sasselov, D. D., Torres, G., Jha, S., & Kulkarni, S. R. 2003, in 

Bulletin of the American Astronomical Society, 1416 — H 
Kovacs, G., Bakos, G. A., Torres, G., et al. 2007, ApJ, 670, L41 
Mandushev, G., O'Donovan, R T, Charbonneau, D., et al. 2007, ApJ, 667, L195 
Mazeh, T, Zucker, S., & Pont, R 2005, MNRAS, 356, 955 
McCullough, P R., Stys, J. E., Valenti, J. A., et al. 2006, ApJ, 648, 1228 
Minniti, D., Fernandez, J. M., Diaz, R. R, et al. 2001, ApJ, 660, 858 
Nordstrom, B., Mayor, M., Andersen, J., et al. 2004, Astron. & Astrophys, 418, 

989 

O'Donovan, R T, Charbonneau, D., Bakos, G. A., et al. 2007, ApJ, 663, L37 



Fressin, Guillot & Nesta: Groups within transiting exoplanets? 



Fressin, Guillot & Nesta: Groups within transiting exoplanets? 



13 



Online data 

Appendix: Statistical evaluation of the model 

6.1. Univariate tests on individual planet characteristics 

In this section, we detail the statistical method and tests that have 
been used to validate the model. We first perform basic tests of 
our model with simulations repeating multiple timesthe number 
of observations of the OGLE survey in order to get 50, 000 de- 
tections. This number was chosen as a compromise between sta- 
tistical significance and computation time. Table |6]compares the 
mean values and standard variations in the observations and in 
the simulations. The closeness of the values obtained for the two 
populations is an indication that our approach provides a reason- 
ably good fit to the real stellar and planetary populations, and to 
the real planet compositions and evolutions. 

However, we do require more advanced statistical tests. First, 
we use the so-called Student's t-test to formally compare the 
mean values of all characteristics for both types of planets. The 
intuition is that, should the model yield simulated planets of at- 
tributes similar to real planets, the average values of these at- 
tributes should not be significantly different from one another 
In other words, the so-called null hypothesis //q is that the dif- 
ference of their mean is zero. Posing //(>: /i' - yu' =0 where 
superscripts r and s denote real and simulated planets respec- 
tively, and the alternative hypothesis being the complement 
Ha. i/ — jJ^ + 0, we compute the t statistics using the first and 
second moments of the distribution of each planet characteristics 
as follows: 

f = (2) 

where x is each of the planet characteristics, n is the size of each 
sample, and Sp is the square root of the pooled variance account- 
ing for the sizes of the two population sample^ The statistics 
follows a t distribution, from which one can easily derive the 
two-tailed critical probability that the two samples come from 
one unique population of planets, i.e. Hq cannot be rejected. The 
results are displayed in Table|7](Note that Q is the Safronov num- 
ber; other parameters have their usual meaning). In all cases, the 
probabilities are larger than 40%, implying that there is no sig- 
nificant difi'erence in the mean characteristics of both types of 
planets. In other words, the two samples exhibit similar central 
tendencies. 

Next, we perform the Kolmogorov-Smirnov test to allow for 
a more global assessment of the compatibility of the two popula- 
tions. This test has the advantage of being non-parametric, mak- 
ing no assumption about the distribution of data. This is partic- 
ularly important since the number of real planets remains small, 
which may alter the normality of the distribution. Moreover, 
the Kolmogorov-Smirnov comparison tests the stochastic dom- 
inance of the entire distributions of real planets over simulated 
planets. To do so, it computes the largest absolute deviations D 
between Frix), the empirical cumulative distribution function of 
characteristics x for real planets, and Fs(x) the cumulative dis- 
tribution function of characteristics x for simulated planets, over 

^ The pooled variance is computed as the sum of each sample vari- 
ance divided by the overall degree of freedom: 

" (fir - 1) + K - 1) 



Table 7. Test of equality of means. Student's t value and criti- 
cal probabilities p that individual parameters for both real and 
simulated planets have the same sample mean. 



parameter 


t 


P 


M, 


-0.277 


0.782 


[Fe/H] 


-0.392 


0.695 




0.707 


0.480 




0.331 


0.741 


P 


-0.276 


0.783 


M„ 


-0.570 


0.569 




0.642 


0.521 


T 
9 


0.834 


0.405 


-0.585 


0.559 



the range of values of x. D - max{|F,.ea/(x) - F.,,-,,, (jt;)|). If the 

.V 

calculated D-statistic is greater than the critical Z)*-statistic (pro- 
vided by the Kolmogorov-Smirnov table -for 3 1 observations 
D* = 0.19 for a 80% confidence level and D' = 0.24 for a 95% 
confidence level-), then one must reject the null hypothesis that 
the two distributions are similar. Ho : \Fr{x) - Fs{x)\ < D*, and 
accept Ha : \Frix) - Fs{x)\ > D*. Table[8]shows the result of the 
test. The first column provides the D-Statistics, and the second 
column gives the probability that the two samples have the same 
distribution. 

Table 8. Kolmogorov-Smirnov tests. D-statistics and critical 
probabilities that individual parameters for both real and sim- 
ulated planets have the same distribution. 



parameter 


D 


P 




0.154 


0.492 


[Fe/H] 


0.161 


0.438 




0.135 


0.662 


R* 


0.141 


0.612 


P 


0.145 


0.572 




0.173 


0.347 


Rp 


0.126 


0.745 


T 

9 


0.180 


0.303 


0.163 


0.381 



Again, we find a good match between the model and ob- 
served samples: the parameters that have the least satisfactory 
fits are the planet's equilibrium temperature and the planet mass 
respectively. These values are interpreted as being due to im- 
perfections in the assumed star and planet populations. It is im- 
portant to stress that although the extrasolar planets' main char- 
acteristics (period, mass) are well-defined by the radial-velocity 
surveys, the subset of transiting planets is highly biased towards 
short periods and corresponds to a relatively small sample in the 
known radial-velocity planet population. This explains why the 
probability that the planetary mass is drawn from the same dis- 
tribution in the model and in the observations is relatively low, 
which may otherwise seem surprising given that the planet mass 
distribution would be expected to be relatively well defined by 
the radial-velocity measurements. 

6.2. Tests in two-dimensions 

Tests of the adequation of observations and models in two di- 
mensions, i.e. when considering one parameter as compared to 
another one can be performed using the method of maximum 
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Table 6. Mean values and standard deviations for the system parameters for the observed transiting planets and our simulated 
detections. 



real planets 







Rr 


P 


T,, 




[Fe/H] 


e 


mean 


1.834 


1.235 


3.387 


1510 1.094 1.164 


5764 


0.087 


0.148 


(T 


2.645 


0.178 


3.540 


300.2 0.186 0.304 


464 


0.187 


0.270 










simulated detections 










Mp 


J^P 


P 


Te, R^ 




[Fe/H] 


e 


mean 


1.655 


1.248 


3.217 


1564 1.073 1.167 


5813 


0.07805 


0.129 


(T 


2.401 


0.186 


2.897 


411.0 0.195 0.324 


599 


0.217 


0.270 



Table 9. Standard deviations from maximum likelihood of the 
model and observed transiting planet populations 



parameter 


planets from 


all planets 




transit surveys 




M^, VS. P 


1.19a- 


1.25 a- 


vs. P 


1.48 0- 


1.62 a- 


Rp vs. Mp 


1.70O- 


1.82 or 


P vs. [Fe/H] 


1.09 o- 


1.09 a- 


Rp vs. [Fe/H] 


1.61 cr 


1.71 a- 


g vs. P 


0.61 cr 


0.51 cr 


vs. Teff 


0.68cr 


0.63 a- 


Rp vs. Teff 


1.22 a- 


1.45 a- 



likelihood as described in Paper I. Table |9] provides values of 
the standard deviations from maximum likelihood for important 
combinations of parameters. The second column is a compari- 
son using all planets discovered by transit surveys, and the third 
column using all known transiting planets (including those dis- 
covered by radial velocity). 

The results are generally good, with deviations not exceed- 
ing 1 .82cr. They are also very similar when considering all plan- 
ets or only the subset discovered by photometric surveys. This 
shows that the radial-velocity and photometric planet character- 
istics are quite similar. The mass vs. radius relation shows the 
highest deviation, as a few planets are outliers of our planetary 
evolution model. 

6.3. Multivariate assessment of tiie performance of tiie 
model 

6.3.1. Principle 

Tests such as the student-t statistics and the Kolmogorov- 
Smirnov test are important to determine the adequation of given 
parameters, but they do not provide a multivariate assessment of 
the model. In order to assess globally the viability of our model 
we proceed as follows: We generate a list including 50,000 "sim- 
ulated" planets and the 3 1 "observed" giant planets from Table[T] 
This number is necessary to get an accurate multi-variate analy- 
sis (see paragraph |6.3.2| i. A dummy variable Y is generated with 
value 1 if the planet is observed, if the planet is simulated. 

In order to test dependencies between parameters, we have 
presented in table |3](§ 12. 4l i the Pearson correlation coefficients 
between each variable including Y. A first look at the table shows 
that the method rightfully retrieves the important physical corre- 
lation without any a priori information concerning the links that 
exist between the different parameters. For example, the stellar 
effective temperature Tetj is positively correlated to the stellar 
mass M*, and radius It is also naturally positively correlated 
to the planet's equilibrium temperature Teq, and to the planet's 



radius Rp simply because evolution models predict planetary 
radii that are larger for larger values of the iiTadiation, all pa- 
rameters being equal. Interestingly, it can be seen that although 
the Safronov number is by definition correlated to the planetary 
mass, radius, orbital period and star mass (see eq. [TJ, the largest 
correlation parameters for 6 in absolute value are those related 
to Mp and P (as the range of these parameters both vary by more 
than one decade, while Mi, and Rp only vary by a factor 2). 
Also, we observe that the star metallicity is only correlated to 
the planet radius. This is a consequence of our assumption that 
a planet's heavy element content is directly proportional to the 
star's [Fe/H], and of the fact that planets with more heavy ele- 
ments are smaller, all other parameters being equal. The planet's 
radius is itself coiTelated negatively with [Fe/H] and positively 
with Teq, Mi,,Ri, and T^s. Table [3] also shows the coiTelations 
with the "reality" parameter. Of course, a satisfactory model is 
one in which there is no correlation between this reality param- 
eter and other physical parameters of the model. In our case, the 
coiTesponding correlation coefficients are always small and in- 
dicate a good match between the two populations. 

Obviously the unconditional probability that a given planet 
isrealisPr(}' = 1) = 31/50031 - .00062. Now we wish to know 
whether this probability is sensitive to any of the planet charac- 
teristics, controlling for all planet characteristics at once. Hence 
we model the probability that a given planet is "real" using the 
logistic cumulative density function as follows: 

Pr(y=l|Xi) = -^^ (4) 

where X, is the vector of explanatory variables (i.e. planet char- 
acteristics) for the planet / (real or simulated), and b is the vector 
of parameter to be estimated, and Xib = -i- lljXijbj, and bo is 
a constant. There are n events to be considered (/ - l..n) and m 
explanatory variables (j - l..m). 

Importantly, an ordinary least square estimator shall not be 
used in this framework, due to the binary nature of the depen- 
dent variables. (Departures from normality and predictions out- 
side the range [0; 1] are the quintessential motivations). Instead, 
Equation |4] can be estimated using maximum likelihood meth- 
ods. The so-called logit specification (Greene 2000) fits the pa- 
rameter estimates b so as to maximize the log likelihood function 



logL(Y|X,b) = J^y- ^'"^ - Z + ^""'l- 

/■=! !=1 

The logL function is then maximized choosing b such that 
c?logL(3',-,Xi, b)/cJb = 0, using a Newton-Raphson algorithm. 
The closer the coefficients b\,b2, .■,b,„ are to 0, the closer the 
model is to the observations. Conversely, a coefficient that is 
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significantly different from zero tells us that there is a correla- 
tion between this coefficient and the probability for a planet to 
be "real", i.e. the model is not a good match to the observations. 

Two features of logistic regression using maximum like- 
lihood estimators are worth to be mentioned. First, the value 
added of the exercise is that the multivariate approach allows 
us to hold all other planet characteristics constant, extending the 
bivariate correlations to the multivariate case. In other words, we 
control for all planet characteristics at once. Second, one can test 
whether a given parameter estimate is equal to with the usual 
null hypothesis Hq: b - versus Ha', b ^ 0. The variance of the 
estimatofl is used to derive the standard error of the parameter 
estimate. Using equation |6] dividing for each variable bj by the 
standard error s.e.{bj) yields the t-statistics and allows us to test 
Hq. We note Pj the probability that a higher value of t would 
occur by chance . This probability is evaluated for each explana- 
tory variable j. Should our model perform well, we would expect 
the t value of each parameter estimate to be null, and the corre- 
sponding probability to be close to one. This would imply 
no significant association between a single planet characteristics 
and the event of being a "real" planet. 

Last but not least, the global probability that the model and 
observations are compatible can be estimated. To do so, we com- 
pute the log likelihood obtained when bj = for j - l..m, where 
m is the number of variables. Following eq. ©I 



logLiWhbo) = J]yibo - Yj log [l + 



(6) 



The maximum of this quantity is logLo = nolog(no/n) + 
«! log(ni/«), where «(> is the number of cases in which y = 
and rii is the number of observations with y - 1. Lq is thus the 
maximum likelihood obtained for a model which is in perfect 
agreement with the observations (no explanatory variable is cor- 
related to the probability of being real). Now, it can be shown 
that the likelihood statistic ratio 



cll = 2(logLi - logLo) 



follows a distribution for a nu mber of degrees of freedom 
m when the null hypothesis is true jAldrich & NelsonI (|1984|) ). 
The probability that a sum of m normally distributed random 
variables with mean and variance 1 is larger than a value cll 



(8) 



where P(k, z) is the regularized Gamma function (e.g. 
lAbramowitz & St egun ( 1964.) ). !P^2 is thus the probability that 
the model planets and the observed planets are drawn from the 
same distribution. 



6.3.2. Determination of tine number of model planets 
required 

A problem that arose in the course of the present work was to 
evaluate the number of model planets that were needed for the 
logit evaluation. It is often estimated that about 10 times more 
model points than observations are sufficient for a good tests. We 
found that this relatively small number of points indeed leads to 
a valid identification of the explanatory variables that are prob- 
lematic, i.e. those for which the b coefficient is significantly dif- 
ferent from (if any). However, the evaluation of the global 



20000 30000 
Sample Size 



Fig. 13. Values of the;^'^ probability, !P^2 (see text) obtained after 
a logit analysis as a function of the size of the sample of model 
planets no- 



probability was then found to show considerable statistical vari- 
ability, probably given the relatively large number of explanatory 
variables used for the study. 

In order to test how the probability !P^: depends on the size 
n of the sample to be analyzed, we first generated a very large 
list of A^o simulated planets with CoRoTlux. We generated by 
Monte-Carlo a smaller subset of «o < A^o simulated planets that 
was augmented of the ni - 31 observed planets and computed 
P^i using the logit procedure. This exercise was performed 1000 
times, and the results are shown in fig. 16.3.21 The resulting !P^2 
is found to be very variable for a sample smaller than ~ 20, 000 
planets. As a consequence, we chose to present tests performed 
for no = 50, 000 model planets. 



(7) 6.3.3. Analysis of two CoRoTlux samples 



Tlie variance of the estimator is provided by the Hessian 
5MogL(y,|Xi,b)/5b5b'. 



Table |4] (see § 12.41 1 reports the parameter estimates for each of 
the planet/star characteristics. We start by assessing the general 
quality of the logistic regression by performing the chi-square 
test. If the vector of planet characteristics brings no or little in- 
formation as to which type of planets a given observation be- 
longs to, we would expect the logistic regression to perform 
badly. In technical terms, we would expect the conditional prob- 
ability Pr(F = 1|X) to be equal to the unconditional probability 
Pr(y =1). The;if^ test described above is used to evaluate the 
significance of the model. 

We performed several tests: the first column of results in ta- 
ble[TO]shows the result of a logit analysis with the whole series of 
9 explanatory variables. Globally, the model behaves well, with a 
likelihood statistic ratio cll - 5.8 and ax^ distribution for 9 de- 
grees of freedom yielding a probability T^i - 0.758. When ex- 
amining individual variables, we find that the lowest probability 
derived from the Student test is that on [Fe/H]: !P[Fe/H] = 0.164, 
implying that the stellar metallicity is not well reproduced. As 
discussed previously, this is due to the fact that several planets 
of the observed list have no or very poorly constrained determi- 
nation of the stellar [Fe/H], and that a default value of was then 
used. 

The other columns in table [10] show the result of the logit 
analysis when removing one variable (i.e. with only 8 explana- 
tory variables). In agreement with the above analysis, the high- 
est global probability T^i is obtained for the model without the 
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[Fe/H] variable. When removing other variables, the results are 
very homogeneous, indicating that although the model can cer- 
tainly be improved, there is no readily identified problem except 
that on [Fe/H]. (We hope that future observations will allow for 
better constraints on these stars' metallicities). 

In order to further test the method, we show in table [TT] the 
results of an analysis in which the model radii where artificially 
augmented by 10%. The corresponding probabilities are signifi- 
cantly lower: we find that the model can explain the observations 
by chance only in less than 1/10,000. The probabilities for each 
variable are affected as well so that it is impossible to identify 
the culprit for the bad fit with the 9 variables. However, when 
removing Rp from the analysis sample, the fit becomes signifi- 
cantly better (Note that the results for that column are slightly 
different than those for the same column in table [TO] because of 
the dependance of with R^.) 



Fressin, Guillot & Nesta: Groups within transiting exoplanets? 
Table 10. Result of the logit analysis for the fiducial model with 50,000 model planets and 31 observations. 
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All variables 








Missing variable 
















[Fe/H] 


Teff 


^* 




P 


Rp 


Mp 


e 


Teq 


[Fe/H] 


fe[Fe/H] 


0.415 




0.435 


0.413 


0.430 


0.421 


0.229 


0.385 


0.371 


0.369 






[0.164] 




[0.148] 


[0.164] 


[0.150] 


[0.157] 


[0.242] 


[0.177] 


[0.196] 


[0.182] 


Teff 




-0.517 


-0.579 




-0.541 


-0.191 


-0.569 


-0.563 


-0.541 


-0.537 


-0.618 




[0.417] 


[0.366] 




[0.372] 


[O.oUlJ 


[0.355] 


[0.378] 


[0.391] 


[0.395] 


[0.298] 


R* 


hR^ 


0.059 


0.046 


0.190 




0.212 


-0.009 


0.061 


0.027 


0.023 


-0.063 






[0.901] 


[0.924] 


[0.681] 




[u.ouyj 




[0.898] 


[0.953] 


[0.961] 


[0.871] 


Mi, 




0.467 


0.541 


-0.001 


0.511 




0.472 


0.524 


0.483 


0.486 


0.497 






[0.528] 


[0.468] 


[0.998] 


[0.433] 






[0.481] 


[0.512] 


[0.508] 


[0.499] 


P 


hp 


-0.236 


-0.288 


-0.425 


-0.199 


-0.250 




-0.281 


-0.269 


-0.356 


-0.070 




•Pp 


[0.746] 


[0.698] 


[0.605] 


[0.754] 


FA 7-371 




[0.706] 


[0.708] 


[0.618] 


rA 000 n 

[0.883] 


Rp 


K 


0.305 


-0.069 


0.331 


0.305 


0.328 


0.316 




0.261 


0.246 


0.241 






[0.370] 


[0.778] 


[0.332] 


[0.370] 


[0.336] 


[0.352] 




[0.416] 


[0.456] 


[0.443] 


Mp 




0.329 


-0.032 


0.432 


0.306 


0.379 


0.386 


0.055 




-0.229 


0.118 




"Pm, 


[0.726] 


[0.968] 


[0.656] 


[0.737] 


[0.693] 




[0.947] 




[0.474] 


[0.876] 


e 


be 


-0.9U4 


f\ Ada 


-l.OUj 


-0.879 


-0.971 


-1.049 


-0.023 


-U.4zz 




-0.033 




fe 


[0.563] 


[0.706] 


[0.540] 


[0.567] 


[0.548] 


[0.497] 


[0.658] 


[0.410] 




[0.620] 


T 

^ eq 


br^ 
"Prcq 


-0.296 


-0.023 


-0.520 


-0.250 


-0.339 


-0.169 


-0.089 


-0.186 


-0.150 






[0.648] 


[0.970] 


[0.414] 


[0.635] 


[0.605] 


[0.744] 


[0.882] 


[0.742] 


[0.801] 












overall assessment of the fit 












Log likelihood 


-257.059 


-258.123 


-257.410 


-257.066 


-251.215 ■ 


■257.129 ■ 


■257.439 


-257.126 


-257.316 


-257.171 






5.821 


3.692 


5.119 


5.805 


5.389 


5.681 


5.060 


5.687 


5.307 


5.597 






0.758 


0.884 


0.745 


0.669 


0.715 


0.683 


0.751 


0.682 


0.724 


0.692 



Table 11. Result of the logit analysis for the altered model (/?p increased by 10%) with 50,000 model planets and 31 observations. 







All variables 








Missing variable 
















[Fe/H] 


Teff 


R* 


M^ 


p 


Rp 


Mp 


e 


T 

-* eq 


[Fe/H] ^rpe/Hl 


-0.738 




-0.740 


-0.737 


-0.737 


-0.733 


0.224 


-0.607 


-0.728 


-0.664 




'P[¥m\ 


[0.002] 




[0.002] 


[0.002] 


[0.002] 


[0.002] 


[0.251] 


[0.009] 


[0.002] 


[0.005] 


TeS 




-0.729 


-0.713 




-0.742 


-0.255 


-0.819 


-0.573 


-0.545 


-0.739 


-0.308 




I'm 


[0.260] 


[0.268] 




[0.231] 


[0.483] 


[0.192] 


[0.366] 


[0.404] 


[0.256] 


[0.618] 


R* 


bRi. 


0.032 


0.013 


0.197 




0.247 


-0.091 


0.032 


0.237 


0.018 


0.558 






[0.945] 


[0.978] 


[0.661] 




[0.540] 


[0.828] 


[0.945] 


[0.620] 


[0.970] 


[0.149] 




bu^, 


0.677 


0.650 


0.017 


0.702 




0.684 


0.532 


0.557 


0.667 


0.598 






[0.370] 


[0.388] 


[0.966] 


[0.291] 




[0.363] 


[0.472] 


[0.461] 


[0.377] 


[0.430] 


P 


hp 


-0.417 


-0.356 


-0.664 


-0.395 


-0.432 




-0.366 


-0.393 


-0.249 


-1.706 




Vp 


[0.585] 


[0.618] 


[0.421] 


[0.565] 


[0.575] 




[0.622] 


[0.641] 


[0.716] 


[0.037] 


Rp 


K 


-1.986 


-1.264 


-1.974 


-1.986 


-1.985 


-1.995 




-1.763 


-1.973 


-1.796 






[0.000] 


[0.000] 


[0.000] 


[0.000] 


[0.000] 


[0.000] 




[0.000] 


[0.000] 


[0.000] 






-1.359 


-0.894 


-1.350 


-1.359 


-1.354 


-1.305 


-0.328 




-1.150 


-1.045 






[0.001] 


[0.019] 


[0.002] 


[0.001] 


[0.001] 


[0.001] 


[0.558] 




[0.001] 


[0.052] 


e 


be 


0.384 


0.271 


0.461 


0.376 


0.387 


0.193 


0.021 


-1.714 




0.338 




fe 


[0.359] 


[0.541] 


[0.327] 


[0.347] 


[0.372] 


[0.494] 


[0.976] 


[0.009] 




[0.633] 


T 

^ eq 


br^ 

Jot 


1.189 


0.797 


0.940 


1.212 


1.165 


1.439 


-0.009 


0.603 


1.202 






[0.045] 


[0.162] 


[0.100] 


[0.014] 


[0.051] 


[0.001] 


[0.987] 


[0.334] 


[0.054] 












overall assessment of the fit 












Log 


likelihood 


-243.645 


-247.922 


-244.341 


-243.648 


-244.098 ■ 


■243.872 ■ 


■257.580 


-246.194 


-243.872 


-245.271 




Cll 


32.647 


24.094 


31.256 


32.643 


31.743 


32.194 


4.778 


27.551 


32.194 


29.395 






0.000 


0.002 


0.000 


0.000 


0.000 


0.000 


0.781 


0.001 


0.000 


0.000 



